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ABSTRACT 


Human emotions are states of mental health that resolve spontaneously rather than 
through conscious exertion, and are accompanied by physiological changes in the facial 
muscles that signify expressions. Nonverbal communication methods such as 
expressions, eye movements, and gestures are used in many applications of human 
computer interaction. Identifying emotions is not an easy task because there is no 
difference between the emotions of a face, and there is also a lot of complexity and 
variability. The machine learning algorithm uses some open features to model the face. 
Automatic emotion recognition based on facial expression is an interesting research 
field, which has presented and applied in several areas such as safety, health and in 
human machine interfaces. Researches in this field are interested in developing 
techniques to interpret, code facial expressions and extract these features in order to 
have a better prediction by computer. Machine learning, one of the top emerging 
sciences, has an extensive range of applications. In this paper, the optimization 
techniques-based feature extraction techniques are used to enhance the recognition of 
the human emotion using facial images. The optimization techniques like Ant Colony 
Optimization, Particle Swarm Optimization, Genetic Algorithm are used. Various 
metrics are used to evaluate the performance of the feature extraction techniques for 
emotion recognition. 
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1. INTRODUCTION 


An envisaged aim of artificial intelligence is to make the interaction between human and next 
generation computing systems more natural. In order to achieve efficient and smooth interaction 
between human and computer systems, a series of aspects of human behavior should be taken 
into account. One of the most important aspects concerns the emotional behavior and the 
affective state of the human. Next generation human-centered computing systems should 
possess the capacity to perceive, accurately analyze and deeply understand emotions as 
communicated by social and affective channels [1]. 


Emotions constitute an innate and important aspect of human behavior that colors the way 
of communication. Humans express their innate conditions through various channels, such as 
body language and facial expressions. Facial expressions are the most direct and meaningful 
channel of non-verbal communication, which forms a universal language of emotions that can 
instantly express a wide range of human emotional states, feelings and attitudes and assists in 
various cognitive tasks. The accurate analysis and interpretation of the emotional content of 
human facial expressions is essential for the deeper understanding of human behavior. Indeed, 
facial expressions are to wit the most cogent, naturally preeminent means for human beings to 
communicate emotions, comprehension, and intentions and to regulate interactions and 
communication with other people [1] [2]. 


Facial expressions considerably assist in direct communication and it has been indicated 
that during face-to-face human communication, 7% of the information is communicated by the 
linguistic part, such as the spoken words, 38% is communicated by paralanguage, such as the 
vocal part, and 55% 1s communicated by the facial expressions [3]. Indeed, even a simple signal 
such as a head nod or a smile can convey a large number of meanings [4] [5]. In general, facial 
expressions are the most natural, meaningful and important communication channel of human 
interaction and communication. 


The recognition of facial expressions is assistive in a wide spectrum of systems and 
applications and is quite necessary for achieving naturalistic interaction. The facial expressions 
assist in various cognitive tasks; so, reading and interpreting the emotional content of human 
expressions is essential for deeper understanding of human condition. Therefore, the main aim 
of facial expression recognition methods and approaches is to enable machines to automatically 
estimate the emotional content of a human face. Giving computer applications the ability to 
recognize the emotional state of humans from their facial expressions is a very important and 
challenging task with wide ranging applications [6]. 


2. RELATED WORKS 


Ngo, Quan T., and Seokhoon Yoon [6] applied deep learning techniques, and proposed a novel 
loss function called weighted-cluster loss, which is used during the fine-tuning phase. 
Specifically, the weighted-cluster loss function simultaneously improves the intra-class 
compactness and the inter-class separability by learning a class center for each emotion class. 
It also takes the imbalance in a facial expression dataset into account by giving each emotion 
class a weight based on its proportion of the total number of images. 


Abdulrazaq, Maiwan B., et al [7] endeavors to inspect accuracy ratio of six classifiers based 
on Relief-F feature selection method, relying on the utilization of the minimum quantity of 
attributes. The classifiers in which the paper attempts to inspect are Multi-Layer Perceptron, 
Random Forest, Decision Tree, Support Vector Machine, K-Nearest Neighbor, and Radial 
Basis Function. 


Liu, Xiaoqian, and Fengyu Zhou [8] applied a strategy of curriculum learning to facial 
expression recognition during the stage of training and a novel curriculum design method is 
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proposed. The system first employs the unsupervised density—distance clustering method to 
determine the clustering center of each category. Then, the dataset is divided into three subsets 
of various complexity according to the distance from each sample to the clustering center in the 
feature space. Importantly, the authors developed a multistage training process where a main 
model is trained by continuously adding harder samples to training set to increase the 
complexity. 

Zheng, Hao, et al [9] proposed a discriminative DMTL (DDMTL) facial expression 
recognition method, which overcomes the above shortcomings by considering both the class 
label information and the samples’ local spatial distribution information simultaneously. The 
authors further designed a siamese network to evaluate the local spatial distribution through an 
adaptive reweighting module, utilizing the class label information with different confidences. 


Caroppo, Andrea, Alessandro Leone, and Pietro Siciliano [10] exploring the performance 
of existing deep architectures for the task of classifying expression of ageing adults are absent 
in the literature. In the present work a tentative to try this gap is done considering the 
performance of three recent deep convolutional neural networks models (VGG-16, AlexNet 
and GoogLeNet/Inception V1) and evaluating it on four different benchmark datasets (FACES, 
Lifespan, CIFE, and FER2013) which also contain facial expressions performed by elderly 
subjects. As the baseline, and with the aim of making a comparison, two traditional machine 
learning approaches based on handcrafted features extraction process are evaluated on the same 
datasets. Carrying out an exhaustive and rigorous experimentation focused on the concept of 
“transfer learning”, which consists of replacing the output level of the deep architectures 
considered with new output levels appropriate to the number of classes (facial expressions), and 
training three different classifiers (i.e., Random Forest, Support Vector Machine and Linear 
Regression), VGG-16 deep architecture in combination with Random Forest classifier was 
found to be the best in terms of accuracy for each dataset and for each considered age-group. 


Zhou, Linyi, et al [11] develop a 3D attention mechanism for feature refinement which 
selectively focuses on attentive channel entries and salient spatial regions of a convolution 
neural network feature map. Moreover, a deep metric loss termed Triplet-Center (TC) loss is 
incorporated to further enhance the discriminative power of the deeply-learned features with an 
expression-similarity constraint. It simultaneously minimizes intra-class distance and 
maximizes inter-class distance to learn both compact and separate features. 


Liu, Yuanyuan, et al [12] proposed a dynamic multi-channel metric learning network for 
pose-aware and identity-invariant FER, called DML-Net, which can reduce the effects of pose 
and identity for robust FER performance. Specifically, DML-Net uses three parallel multi- 
channel convolutional networks to learn fused global and local features from different facial 
regions. Then it uses joint embedded feature learning to explore identity-invariant and pose- 
aware expression representations from fused region-based features in an embedding space. 
DML-Net is end-to-end trainable by minimizing deep multiple metric losses, FER loss, and 
pose estimation loss with dynamically learned loss weights, thereby suppressing overfitting and 
significantly improving recognition. 

Najar, Fatma, et al [13] addressed the problem of human activities and facial expression 
recognition by investigating the effectiveness of Bayesian inference methods. Indeed, a novel 
method termed as Bayesian learning for finite multivariate generalized Gaussian mixture model 
is developed. The multivariate generalized Gaussian distribution is encouraged by its ability to 
model a large range of data and its shape flexibility. The authors developed a Markov Chain 
Monte Carlo within Metropolis-Hastings algorithm for proposed generative model. In this 
research, the authors tackled also some key issues related to machine learning and pattern 
recognition such as the statistical model’s parameters estimation. 
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Xie, Yuan, et al [14] proposed a novel Adversarial Graph Representation Adaptation 
(AGRA) framework that unifies graph representation propagation with adversarial learning for 
cross-domain holistic-local feature co-adaptation. To achieve this, the authors first build a graph 
to correlate holistic and local regions within each domain and another graph to correlate these 
regions across different domains. Then, the authors learned the per-class statistical distribution 
of each domain and extract holistic-local features from the input image to initialize the 
corresponding graph nodes. Finally, the authors introduced two stacked graph convolution 
networks to propagate holistic-local feature within each domain to explore their interaction and 
across different domains for holistic-local feature co-adaptation. 


Jiaming, Tang, et al [15] proposed a multi-scale and multi-region vector triangle texture 
feature extraction scheme based on weakly supervised clustering algorithm. According to the 
information gain rate of extracted features, combined with threshold selection and random 
dropout strategy, the best selection of vector triangle texture feature scale is explored, and the 
feature space is optimized under the premise of sufficient feature space information, the 
reduction of feature space is realized and the information redundancy is reduced. For the 
positive and negative expression units, the facial expression images in the data set are divided 
into two categories. 


3. ANT COLONY OPTIMIZATION 


The underlying metaphor of ant colony optimization (ACO) [16] is the way that some insects 
living in collaborative colonies look for food. Indeed, if an ant nest feels a food source, then 
some expeditions of ants go —by different paths— to search for this food, leaving a pheromone 
trail, a chemical substance that animals usually have, but very important for insects. This 
pheromone trail is an affective signal for other ants, that will recognize the way followed by its 
predecessors. Between all expeditions of ants, there will be some that arrive first to the food 
source because they took the shortest path, and then they will go back to the nest first than the 
other expeditions. Then, the shortest path has been reinforced in its pheromone trail; therefore, 
new expeditions will probably take that path more than others will, unless new better paths (or 
parts of paths) are found by some expeditions. It 1s expected that the pheromone trail of the 
shortest path is more and more intense, and the one of the other paths will evaporate. 


When applying this principle to combinatorial optimization problems, we look for an 
implementation that uses the principle of reinforcement of good solutions, or parts of solutions, 
by the intensification of a value of “pheromone” that controls the probability of taking this 
solution or its part of solution. Now, this probability will depend not only on the pheromone 
value, but also on a value of a “local heuristic” or “short term vision’, that suggests a solution 
or part of solution by a local optimization criterion, for example as greedy algorithms do. 

An optimization method that uses ACO has at least the following components: 

A representation, that enables the construction or modification of solutions by means of a 
probabilistic transition rule, that depends on the pheromone trail and the local heuristic. 

e A local heuristic or visibility, noted 7. 

e An update rule for the pheromone, noted tT 


e A probabilistic transition rule, that depends on n and t. 


4. PARTICLE SWARM OPTIMIZATION 


It is a population-based stochastic optimization technique inspired by the social behavior of 
bird flocking. PSO was proposed by Eberhart and Kennedy in 1995 [17]. It is a metaheuristic 
as it can explore over a search space making no or few previous assumptions about the given 
problem and converges to an optimal solution. The candidate solutions, referred to as particles 
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in the technique, fly around in a multi-dimensional search space, to find out an optimal or sub- 
optimal solution by competition as well as by cooperation among them. Like GA, PSO 1s also 
initialized with a group of random particles and then it looks for optima through the movement 
of candidate solutions in the search space. Each particle is represented by a vector x; = 
(Xi1) Xj2) +») Xip) where D represents the number of features in the dataset. Each particle hence 
has a D-dimensional velocity represented as v; = (Vj1, Vj, ..., Vip). In every iteration, each 
particle is updated with three values: (1) previous velocity, which gives the trend of flow of the 
particles over the search space; (2) pbest, which gives the particles’ best fitness values till the 
present iteration and (3) gbest, which gives the whole generation’s best fitness value till the 
present iteration. The position and velocity of the particles are updated using the following 
equations: 


k+1 . . ; 

vet) = ws ve, + cl «rl * (pia — x.) + 2 * 12 * (pgq — 2',) (1) 
At+1l _ ok , k+l 

Ley = 2g Uy (2) 


Here k represents the kth iteration and d represents the dth feature in the vector. w represents 
the inertia factor which assigns a weight to the impact of previous velocity. cl and c2 are 
acceleration constants. rl and r2 are random numbers in the range [0, 1]. Pea and gia denote the 
state of dth feature in pbest and gbest. 


5. GENETIC ALGORITHM 


GA is a popular evolutionary algorithm computational method developed by Holland in early 
1975 and later enhanced by Goldberg [18]. It 1s a global search technique that solves a given 
problem by mimicking the natural process of evolution. Based on Darwin’s theory, GA utilizes 
the concept of reproduction and survival of the fittest. GA exploits new and better solutions 
without any presumption such as continuity or unimodality. As a process, GA has large 
potential, and due to this, over the years GA has been used for designing, optimizing 
telecommunication, traffic and shipment routing, gaming, market and financial analysis and 
many more. The increase in its use in different sectors is because of the fact that GA can handle 
a large number of parameters, and it comes with a solution which is satisfying enough though 
may not be the best. 


GA consists of a set of solutions, chromosomes or individuals which are strings of binary 
values, “O’’s and “1”s. Each value (“0” or “1’) determines the state of attributes in the 
chromosome. A set of such chromosomes is referred to as a population. Each chromosome 1s 
then evaluated using a fitness function. After ranking the chromosomes according to their 
fitness values, they undergo genetic operations such as crossover and mutation. For this, two 
chromosomes are selected on the basis of their positions on a roulette wheel (biased according 
to each chromosome’s fitness). The two chromosomes first go through crossover and then 
mutation 1s applied to increase the local coverage of search space by the chromosomes, thereby 
decreasing the chances of being stuck at a local optimum. If the evolution process generates 
stronger offspring chromosomes than the previous ones, the algorithm replaces them. The 
evolution process repeats until it meets the end criteria. 


6. RESULT AND DISCUSSION 


6.1. Image Dataset 


The face emotion recognition dataset is taken from the Kaggle repository [19]. The dataset is 
composed of angry, disgust, fear, happy, neutral, sad and surprise emotions. For this paper, 100 
images from each emotion category are considered to evaluate the performance of the 
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Optimization based Feature Extraction techniques like PSO, GA and ACO using three 
classification techniques like ANN, KNN and SVM. 
Angry 


Disgust 


Fear 


Happy 


Neutral 


Sad 


Surprise 





6.2. Performance Metrics 
Table | depicts the performance metrics used in this research paper. 


Table 1 Performance Metrics used in this paper 


Performance Metrics 


TP+TN+FP+ FN 


Sensitivity 


TP+FN 
TN 


Specificity 
TN +FP 


False Positive Rate 1- Specificity 


L- Sensitivity 
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Table 2 depicts the Detection Rate (in %) obtained by the Optimization techniques-based 
feature extraction techniques using the classifiers like Artificial Neural Network (ANN), 
Convolutional Neural Network (CNN) and K-Nearest Neighbor (KNN). Figure 2 gives the 
graphical representation of the Detection Rate (in %) obtained by the Optimization techniques- 
based feature extraction techniques using the classifiers like ANN, CNN and KNN. From the 
table 2 and figure 2, it is clear that the ACO with CNN gives increased detection rate when it is 
compared with other feature extraction techniques. 


Table 2 Detection Rate (in %) obtained by the Optimization based Feature Extraction techniques 
using CNN, ANN and KNN classifier 


Feature Extraction Detection Rate obtained (in %) Classification seems 


Techniques ANN 


55.67 53.62 7 


56.64 53.69 51.23 
64.58 63.37 49.25 


Figure 2: Graphical representation of the Detection Rate (in %) obtained by the 
Optimization based Feature Extraction techniques using CNN, ANN and KNN classifier 





Table 3 depicts the Sensitivity Gn %) obtained by the obtained by the Optimization 
techniques-based feature extraction techniques using the classifiers like CNN, ANN and KNN. 
Figure 3 depicts the graphical representation of the Sensitivity (in %) obtained by the obtained 
by the Optimization techniques-based feature extraction techniques using the classifiers like 
CNN, ANN and KNN. From the table 3 and figure 3, it is clear that the ACO with CNN gives 
increased sensitivity when it is compared with other feature extraction techniques. 


Table 3 Sensitivity (in %) obtained by the obtained by the Optimization based Feature Extraction 
techniques using CNN, ANN and KNN classifier 


Feature Extraction 
Techniques a 
Pp Ga 55348 


Figure 3: Graphical representation of the Sensitivity Gn %) obtained by the Optimization 
based Feature Extraction techniques using CNN, ANN and KNN classifier 


Table 4 depicts the Specificity (in %) obtained by the Optimization based Feature Extraction 
techniques using CNN, ANN and KNN classifier. Figure 4 depicts the graphical representation 
of the Specificity (an %) obtained by the Optimization based Feature Extraction techniques 
using CNN, ANN and KNN classifier. From the table 4 and figure 4, it is clear that the ACO 
with CNN gives increased specificity when it 1s compared with other feature extraction 
techniques. 





Table 4 Specificity Gn %) obtained by the Optimization based Feature Extraction techniques using 
CNN, ANN and KNN classifier 


Feature Extraction Specificity obtained by Classification Techniques 
Techniques 


55.54 53.71 52.32 
56.45 51.68 47.81 
ACO 63.76 58.58 55.82 
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Figure 4: Graphical representation of the Specificity G@n %) obtained by the Optimization 
based Feature Extraction techniques using CNN, ANN and KNN classifier 


Table 5 depicts the False Positive Rate (in %) obtained by the Optimization based Feature 
Extraction techniques using CNN, ANN and KNN classifier. Figure 5 depicts the graphical 
representation of the False Positive Rate (in %) obtained by the Optimization based Feature 
Extraction techniques using CNN, ANN and KNN classifier. From the table 5 and figure 5, it 
is clear that the ACO with CNN gives reduced FPR when it is compared with other feature 
extraction techniques. 


Table 5 False Positive Rate (in %) obtained by the Optimization based Feature Extraction techniques 
using CNN, ANN and KNN classifier 


Feature Extraction False Positive Rate obtained by Classification Techniques 


Techniques ANN 


44.46 46.29 47.68 


43.55 48.32 52.19 
36.24 41.42 44.18 


Table 6 depicts the Miss Rate (in %) obtained by the Optimization based Feature Extraction 
techniques using CNN, ANN and KNN classifier. Figure 6 depicts the graphical representation 
of the Miss Rate (in %) obtained by Optimization based Feature Extraction techniques using 
CNN, ANN and KNN classifier. From the table 6 and figure 6, it is clear that the ACO with 
CNN gives reduced miss rate when it is compared with other feature extraction techniques. 





Table 6 Miss Rate (in %) obtained by the Optimization based Feature Extraction techniques using 
CNN, ANN and KNN classifier 


Feature Extraction Miss Rate obtained by Classification Techniques 


Techniques KNN 


46.66 48.32 50.74 
44.66 44.19 51.76 
32.17 40.46 43.52 


Figure 6: Graphical representation of the Miss Rate Gn %) obtained by the Optimization 
based Feature Extraction techniques using CNN, ANN and KNN classifier 





7. CONCLUSION 


The accurate analysis and interpretation of the emotional content of human facial expressions 
is essential for deeper understanding human behavior. Although a human can detect and 
interpret faces and facial expressions naturally, with little or no effort, accurate and robust facial 
expression recognition by computer systems is still a great challenge. Through an effective 
feature extraction technique, different facial expression can be easily classified into their 
appropriate class. In this research paper, the optimization based feature extraction techniques 
are utilized to enhance the classification of the human facial expression. From the result and 
discussion, the ACO increased the detection rate, specificity, and sensitivity with CNN 
classifier and also it reduced the false positive rate and miss rate with CNN than other 
classification techniques. The performance of the ACO based feature extraction is better when 
it is compared with the other optimization-based feature extraction techniques like PSO and 
GA. 
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